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ABSTRACT 

Finite-difference  approximations  to  derivatives  are  useful  not  only  in  op¬ 
timisation  algorithms,  but  also  in  other  circumstances  such  as  sensitivity  analysis. 
In  this  paper  we  discuss  methods  for  estimating  the  relative  cancellation  error 
and  relative  truncation  error  in  a  finite-difference  approximation  and  propose  a 
technique  for  computing  the  finite-difference  interval  so  that  the  bounds  upon  the 
errors  are  balanced.  We  also  propose  a  method  for  choosing  the  finite-difference 
interval  in  a  quasi-Newton  algorithm  for  unconstrained  minimisation  that  uses 
function  values  only. 

* 


G  1SS0  SpUai  OpUminUoi  Labmtwjr 


v- 


4 


§1 


INTRODUCTION 


1 


1.  Introduction 

• 

Let  F(x),  x  e  R*,  be  a  twice  continuously  differentiable  nonlinear  function.  If  the 
derivatives  of  F{z)  are  too  difficult  or  expensive  to  enluatef  it  it  now  generally 
accepted  that  the  best  method!  available  for  minimising  F(t)  when  n  it  not  too 
large  are  based  on  using  quasi-Newton  methods  with  finite-difference  approxima¬ 
tions  to  the  gradient  vector.  However,  the  success  of  such  algorithms  critically 
depends  on  obtaining  "good*  approximations  to  the  necessary  first  derivatives 
—  much  more  so  than  with  Newton-type  methods  that  use  finite  differences  of 
gradients  to  approximate  second  derivatives.  As  we  shall  see,  certain  standard 
choices  for  the  finite-difference  interval  produce  acceptable  approximations  for 
well  scaled  problems.  However,  they  may  be  disastrous  in  the  presence  of  bad 
scaling  or  a  non-typical  starting  point.  Although  no  procedure  is  guaranteed 
for  every  case,  the  methods  suggested  in  this  paper  are  designed  to  overcome 
the  most  common  difficulties  in  choosing  the  finite-difference  interval,  and  can 
lead  to  substantial  improvements  in  performance  of  the  associated  minimisation 
algorithms. 

Finite-difference  approximations  to  the  gradient  vector  are  derived  from  ex¬ 
panding  F  in  a  Taylor  series  along  an  appropriate  vector,  i.e.: 

F(t  +  kfij)  =  F(x)  +  »,„(*)  +  +  0(h*),  (1) 

where  Ay  is  the  finite-difference  interval  corresponding  to  the  y-th  variable,  ey  is 
the  y-th  column  of  the  identity  matrix,  gy  is  the  y-th  component  of  the  gradient 
vector  g(x),  and  Gyy  is  the  y-th  diagonal  element  of  the  Hessian  matrix  G(x). 
The  two  most  common  finite-difference  formulae  are: 

(i)  the  forward-difference  formula 

9i  w  +  hjtj)  -  #*(*));  (2) 


(ii)  the  central-difference  formula 

9i  «  +  M/)  ~  F(*  ”  Mf)).  (*) 

where  Ay  =  fy(l  -f-  jxyj).  (Note  that  Ay  is  an  absolute  perturbation  and  that  fy 
is  a  relative  perturbation.  When  0  <  |*y|  <  1,  they  are  essentially  equivalent.) 

The  truncation  error  in  (2)  —  i.e.,  the  error  due  to  neglecting  terms  of  the 
series  (1)  —  is  O(Ay),  and  the  truncation  error  associated  with  formula  (3)  is 
0{hty.  The  higher  order  error  of  the  central-difference  formula  is  obtained  at  the 
cost  of  an  additional  function  evaluation  for  each  component  of  the  gradient. 
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The  mein  consideration  in  the  implementation  of  finite-difference  techniques 
is  the  choice  of  the  n  mines  {hj}.  8ince  the  truncstion  error  in  either  (2)  or  (8) 
decreeses  with  kj,  it  might  eppesr  that  choosing  the  finite-diffireace  interval  ns 
smell  es  possible  will  minimise  the  error.  However,  es  hj  becomes  smeller,  the 
function  values  in  (2)  end  (8)  become  closer  end,  es  we  shell  see,  computational 
error  becomes  incrmingiy  important.  In  this  peper  we  consider  techniques  for 
choosing  hj  effectively  that  tain  into  account  beta  sources  of  error. 

We  shell  assume  that  ell  computation  is  carried  out  on  e  machine  that  stores 
a  non-sero  representable  number  /  in  the  form 

}  =  (4) 

where  P  is  the  machine  base,  s  is  the  exponent  and  m  is  a  mantissa  comprising 
r  digits.  All  floating-point  numbers  are  normalised  such  that 

i<M<  l. 

P 

It  is  convenient  (though  not  essential)  to  assume  that  the  machine  performs  cor¬ 
rect  rounding,  that  is,  if  /  denotes  the  floating-point  value  of  a  non-sero  number 
/, then 

tzl  <  Igi-t 

/  S  2P 

The  number  Pl~r  is  termed  the  relative  machine  precision  and  will  be  denoted 
bj  c.  For  a  complete  discussion  of  the  errors  in  floating-point  arithmetic,  see 
Wilkinson  (1963). 

Throughout  this  paper,  it  will  be  important  to  distinguish  “exact*  from 
“computed*  quantities.  Let  7  denote  an  exact  quantity,  and  let  7  denote  an 
approximation  that  satisfies  7  =  (1  +  0)7;  the  value  |o|  will  be  termed  the 
relative  error  in  the  approximation. 


2.  Estimating  the  cancellation  error 

Numbers  computed  in  finite  precision  necessarily  deviate  from  the  correspond¬ 
ing  “exact"  quantities.  Fortunately,  each  occurrence  of  rounding  or  floating¬ 
point  arithmetic  typically  introduces  a  relative  error  bounded  by  the  order  of 
machine  precision,  and  in  most  instances  this  level  of  error,  even  accumulated 
over  many  operations,  is  “negligible*.  However,  certain  computations  carry  the 
risk  of  introducing  much  greater  relative  error  —  in  particular,  the  subtraction 
of  nearly  equal  rounded  numbers.  The  error  ttsoeiaied  with  this  procedure  it 
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termed  cancellation  error  (tee  Kahan,  1978,  for  e  detailed  treatment  of  related 
topics). 

Consider  two  numbers  /i  and  /a,  whose  floating-point  values  are  /j  **  /i(l+ 
ci)  and  /a  =  /a(l  +  <a)»  where  c*  and  c*  are  bounded  in  magnitude  by  the 
relative  machine  precision.  The  exact  difference  of  A  and  /*  can  be  written  at: 

A /  s  /t(l  +  €j)  -  Ml  +  «a )  =  (/»-  Ml  +  9),  (5) 

so  that  9  represents  the  relative  error  in  A/  with  respect  to  the  exact  difference 
of  the  original  numbers. 

It  ft  =  /a,  we  say  that  complete  cancellation  occurs.  Otherwise,  re-arranging 
(5)  gives  an  expression  for  9: 


es/a  —  «a/a 

" — *=7T- 


The  relative  error  in  A/  may  therefore  be  bounded  as  follows: 

Ui/i  — ea/a| 


1*1  = 


/i-/a 
_  I  <a(/i  ~  /a)  +  /i(ci  —  <a) 


/»”*/a 


(«) 


(7) 


assuming  that  |/i|  >  |/a|. 

If  |/i  —  /a|  is  small  relative  to  |/i|  (i.e.,  S\  “nearly  equals”  /a),  (7)  shows 
that  the  relative  error  in  A/  is  not  restricted  to  be  of  order  e.  The  error  may 
be  large  not  because  of  errors  in  subtracting  f\  and  /a  (since  A /  is  their  exact 
difference),  but  rather  because  of  the  initial  errors  incurred  in  rounding  f\  and 
/a;  note  that  if  «i  and  ca  are  sero,  9  =  0.  If  J\  nearly  equals  /a,  the  original 
high-order  significant  digits  cancel  during  subtraction,  which  means  that  low- 
order  digits  discarded  in  rounding  are  the  most  significant  digits  of  the  exact 
result  —  i.e.,  cancellation  reveals  the  error  of  rounding.  IT  /»  and  /a  are  not 
similar,  the  bound  on  the  cancellation  error  becomes  of  the  same  order  as  the 
error  resulting  from  any  other  floating-point  operation,  and  is  not  of  any  special 
significance. 

As  an  example,  consider  the  subtraction  of  the  numbers 

/i  =  .2946796847  and  /a  =  .2946782596  (8) 

on  a  machine  with  a  mantissa  of  six  decimal  digits  (c  =  10  #).  If  correct  round¬ 
ing  is  used,  the  values  of  f\  and  /a  are  .294680  and  .294678,  respectively,  with 
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the  difference  A/  (computed  exactly)  being  .2  X  10“"*.  However,  the  difference 
between  the  exact  values  of  J\  and  J%  is  .14251  X  10  *,  which  implies  that  L) 
has  a  relative  cancellation  error  of  if  =  .40841,  computed  from  (6).  From  (7) 
it  can  be  seen  that  the  bound  on  relative  cancellation  error  decreases  with  «;  if 
eight  figures  are  used  to  represent  ft  and  /*,  the  relative  cancellation  error  is 
.357  X  10~a. 

It  is  not  possible  to  compute  the  exact  value  (6)  of  the  relative  cancella¬ 
tion  error  without  utilising  the  exact  values  of  ft  and  /a,  and  exact  arithmetic. 
Therefore,  we  can  only  estimate  the  bound  (7)  on  the  cancellation  error.  For 
convenience,  we  shall  usually  refer  to  the  estimate  of  a  bound  on  the  cancellation 
error  as  simply  the  "cancellation  error",  and  shall  be  concerned  with  computable 
estimates  of  such  a  bound. 

One  possible  estimate  is  based  upon  finding  the  difference  in  the  mantissas 
of  /i  and  /a.  Suppose  that  f\  =  mifl4,  where  mi  is  the  mantissa  of  f\  and  e  its 
exponent.  The  number  /a  may  be  written  in  the  (possibly  unnormalised)  form 
/a  =  A  measure  of  the  relative  cancellation  error  is  given  by 


-r 


~7|mi  —  mat. 


mi  ft  ma, 
mi=sma. 


W 


If  this  formula  is  used  for  the  example  (8),  we  obtain  fi  =  .5  on  a  six-digit 
decimal  machine  and  fi  =  .704  X  10~*  on  an  eight-digit  machine. 

Note  that  we  adopt  the  convention  that  fi  is  unity  when  complete  cancel¬ 
lation  takes  place.  A  finite  upper  bound  on  the  estimated  cancellation  error  is 
essential  in  order  to  compute  fi,  but  implies  that  fi  will  underestimate  9  if  mi  is 
very  close  to  ma. 

Formula  (9)  is  related  to  a  good  "rule  of  thumb”  method  in  which  the  relative 
cancellation  error  is  estimated  by 


*  =  *Pk, 


(10) 


where  k  is  the  number  of  leading  coincident  digits  in  mi  and  ma. 

A  serious  disadvantage  of  (9)  and  (10)  is  that  it  is  not  a  straightforward 
process  to  obtain  the  mantissa  of  a  floating-point  number  when  using  a  high- 
level  language  such  as  Fortran.  An  alternative  formula  that  uses  the  floating¬ 
point  values  of  fi  and  /a  may  be  derived  as  follows.  Assume  that  fi  and  /a  are 
such  that  |/i|  i  |/a|.  Multiplying  the  denominator  and  numerator  of  (9)  by  ft* 
we  obtain 

$-r  ,  p'r*  L  eiem 

|mi  -  me |  |mjj5*  -  m^*|  p\ /,  -  /, |  * 
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ggll  <  gi-r  £sa 
filfi — Al  A — A 


which  leads  to  the  estimate 


(11) 


Since  this  formula  involves  only  e  and  the  floating-point  values  of  /*  and  /a, 
it  can  be  easily  implemented  in  a  high-level  language.  However,  we  have  incurred 
two  penalties  in  exchange  for  this  ease  of  use.  Firstly,  the  cancellation  error  is 
overestimated  by  the  amount  that  the  mantissa  of  A  is  larger  than  1/0.  The* 
mantissa  could  be  almost  0  times  larger  than  1/0  and  consequently,  on  average, 
(11)  will  be  a  better  estimate  of  17  on  a  binary  machine  than  on  any  other.  On  a 
machine  with  a  large  value  of  0  it  may  be  worth  computing  the  mantissas  of  A 
and  A  and  using  (9),  but  this  must  be  dons  carefully  to  avoid  introducing  further 
rounding  error  in  mi  and  fife.  One  way  of  computing  the  mantissa  of  a  floating¬ 
point  number  is  to  multiply  it  by  powers  of  0  until  the  product  lies  between  1/0 
and  1.  However,  this  technique  will  be  applicable  only  if  the  machine  arithmetic 
can  be  relied  upon  to  change  the  exponent  without  altering  the  mantissa  when 
a  number  is  multiplied  by  a  power  of  the  base.  This  method  may  also  be  too 
expensive  in  terms  of  the  number  of  operations  required.  For  example,  later 
we  shall  require  the  relative  cancellation  error  involved  in  computing  the  finite- 
difference  approximation  of  each  element  of  the  gradient  vector  of  a  multivariate 
function.  If  the  number  of  variables  is  large,  the  extra  computation  involved  in 
finding  the  mantissas  of  A  and  A  may  b*  prohibitive. 

The  second  disadvantage  of  (11)  is  that  the  resulting  value  of  fi  will  not  tend 
monotonically  to  unity  (the  convention  adopted  to  represent  complete  cancella¬ 
tion)  as  A  approaches  Ai  for  example,  the  value  of  fi  for  the  values  of  /1  and  /a 
given  by  (8)  with  six-digit  precision  (<  =  10  *)  is  1.473.  This  problem  is  easily 
overcome  by  redefining  fi  as 


lAk 

lA  -  Al + lAl< 


(13) 


If  (12)  is  used  in  example  (8),  we  obtain  fi  =  .5957  for  a  six-digit  mantissa  and 
fi  =  .7028  X  10“ 2  for  an  eight-digit  mantissa. 

It  is  important  to  note  that  during  the  derivation  of  all  the  preceding  for¬ 
mulae  for  the  relative  cancellation  error,  we  have  assumed  that  the  errors  in  both 
A  and  A  are  of  the  order  of  the  machine  precision.  If  this  is  not  the  case,  the 
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formulas  may  not  be  appropriate.  Suppose  that  the  values  (8)  are  computed  on 
a  four-digit  machine  and  then  represented  on  a  six-digit  machine,  so  that  the 
last  two  figures  are  unreliable  —  for  example,  in  this  case  it  might  happen  that 
ft  =  .294750  and  A  =  .294722,  so  that  the  exact  cancellation  error  in  their 
computed  difference  is  .1864774  X  103.  However,  (12)  gives  =  .9528  X  10“ l, 
a  considerable  underestimate  of  the  error.  The  required  estimate  of  cancellation 
error  must  involve  only  the  correct  digits  of  A  and  /a.  Hence,  if  A  and  /a  are 
computed  to  a  relative  precision  of  o,  that  is 

A  =  /i(l  +  *i)  and  /a  =  /a(l  +  ®a)» 

where  Joi|,  |oa|  <  o,  then  the  formula 

|A-Al  +  |A|«  <l8) 

should  be  used  instead  of  (12).  This  distinction  will  be  important  when  estimat¬ 
ing  the  cancellation  error  associated  with  approximations  to  second  derivatives. 


3.  Errors  in  finite-difference  approximations 

For  simplicity  of  description,  the  discussion  in  the  next  two  sections  will  concern 
estimating  derivatives  of  the  twice  continuously  differentiable  univariate  function 
/(x),  but  all  results  apply  directly  to  the  multivariate  case.  We  shall  assume 
that  /  can  be  computed  to  full  machine  precision. 

The  first  type  of  approximate  derivative  to  be  considered  is  the  forward 
difference  formula  with  interval  h: 

(h) 

We  stress  that  p,  is  meant  to  denote  the  exact  expression  (14),  obtained  by 
applying  exact  arithmetic  to  the  exact  values  of  all  quantities.  The  truncation 
error  in  the  mathematical  approximation  (14)  consists  of  the  neglected  terms  in 
the  Taylor  series  (1): 


*>,(*)  -  /'(*)  =  i*/"W  +  0(*»).  (15) 

Of  course,  fp,(h)  is  not  available  in  the  real  world,  and  thus  we  shall  be  con¬ 
cerned  with  its  computed  version,  whose  evaluation  involves  several  operations 
of  rounding  and  floating-point  arithmetic.  Let  A  denote  the  computed  version 
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*,  =  /(*  + A) -/(*),  (1«) 

where  +  and  —  in  (18)  denote  float/ag-point  addition  and  subtraction.  The 
computed  quantity  that  corresponds  to  (14)  is  then  given  by 


i 

i 


» 


‘ 


0,  =  K/h 

where  floating-point  division  is  implied. 

The  previous  discussion  of  errors  in  computation  of  quantities  like  A,  shows 
that  for  "small*  h,  the  relative  error  in  0,  compared  to  <pr  will  be  dominated 
by  the  cancellation  error  arising  from  computing  A„  and  henceforth  we  shall 
consider  only  this  source  of  computational  error  in  to  be  significant.  If  the 
other  errors  from  computation  were  included,  this  would  only  add  a  number  of 
factors  of  the  form  (1  +  c)  to  the  error  estimates. 

Using  (13),  a  computable  estimate  of  the  cancellation  error  in  £  is  given 
by 

_ _ i/wi<  (1T1 

where  the  arithmetic  operations  in  (17)  refer  to  floating-point  operations,  and  for 
simplicity  we  have  assumed  that  [Afl)l  >  |/(t  +  A)|.  We  adopt  the  convention 
that  ffr  is  unity  if  both  }{t)  and  f(t  +  A)  vanish. 

When  second  derivatives  are  to  be  approximated  by  finite  differences,  a 
similar  analysis  may  be  carried  out  for  this  case.  Define  4(h)  as  the  exact  quan¬ 
tity 

m  =  ^(/(* + h)  -  a/M + /(*  -  A)),  (w) 

and  &  as  its  computed  version. 

As  with  we  can  assume  that  the  only  significant  source  of  computational 
error  in  $  is  due  to  cancellation  in  subtracting  nearly  equal  rounded  function 
values.  An  important  aspect  of  computing  &  is  the  use  of  two  intermediate 
values:  and  the  corresponding  backward-difference  value  $B,  defined  as 


,  _/(*)-/(*-*) 
A 


(19) 


which  is  computed  using  a  value  A,  analogous  to  (16).  Hence,  A  can  be  written 

as 


* 


(20) 


i 
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where  the  arithmetic  operations  are  performed  in  floating-point. 

Once  again,  it  is  reasonable  to  attribute  the  m^jor  error  in  computing  4  to 
cancellation  error;  in  this  case,  however,  there  is  cancellation  error  not  onlj  in 
forming  but  also  in  the  intermediate  values  and  Formula  (13)  can  be 
applied  to  (20)  to  obtain  T,  an  estimate  of  the  computational  error  in 


(21) 


where  o  is  an  upper  bound  on  the  cancellation  errors  associated  with  and 
$>m.  To  compute  o,  we  estimate  1),  from  formula  (17),  and  also  the  analogous 
quantity  t)M  corresponding  to  a  is  then  given  by  max(^,,  flB).  It  is  important 
to  note  that  6  will  be  a  reasonable  estimate  only  if  a  is  sufficiently  small. 


4.  Computing  an  estimate  of  the  optimal  finite-difference  interval 


4.1  Effects  of  h  on  cancellation  and  truncation  errors. 

Changes  in  the  sise  of  the  finite-difference  interval  tend  to  have  opposite  effects  on 
truncation  and  cancellation  error.  When  the  value  $>r  is  used  as  an  approximar 
tion  to  /',  for  example,  the  truncation  error  is  dominant  for  large  values  of  h, 
since  computation  introduces  only  negligible  error;  as  h  decreases,  truncation 
error  in  the  exact  value  <pr  decreases,  but  the  cancellation  error  in  computing 
$>r  becomes  larger. 

To  illustrate  tins  phenomenon,  consider  the  function 


/(*) 


(22) 


which  has  been  evaluated  for  various  values  of  h  at  the  point  x  =  1,  using  short 
precision  on  an  IBM  370.  The  smallest  value  of  h  that  will  register  a  change  in  x 
during  floating-point  addition  is  the  machine  precision  e  =  16— 8  «  .95  X  10~*. 
The  function  was  computed  with  h  values  increasing  from  €  in  multiples  of  ten. 
Table  1  contains  the  results  of  the  computation.  The  first  three  columns  contain 
the  values  of  h,  the  computed  function  value  at  x  —  1,  and  the  computed  func¬ 
tion  value  at  x  -f-  h.  The  fourth  column  contains  values  of  T{h),  the  relative 
truncation  error  incurred  by  using  the  exact  value  <p,(h)  to  approximate  /*.  The 
fifth  column  contains  an  estimate  of  the  relative  cancellation  error  in  using 
formula  (13)  with  a  =  c.  The  final  column  contains  the  computed  values  of 
The  exact  value  of  /'(s)  (rounded  to  six  figures)  is  .954366  X  101. 
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Cancellation  and  truncation  errors  in  with  c  =  .053674  x  10~* 


a 

/<*) 

/(*+*) 

m 

MM 

< 

.305828  X  10* 

.303828  X  10* 

.121183  X  10“* 

.302670  X  10* 

.700000  X  10* 

toe 

.303828  X  10* 

.30383T  X  10* 

.121180  X  10“4 

.300016  X  10“* 

.050000  X  10* 

10a« 

.308828  X  10* 

.303010  X  10* 

.121188  X  10-* 

.318226  X  10-* 

.052000  X  10* 

10*« 

.303828  X  10* 

.304TS0  X  10* 

.121264  X  10-* 

J18T30  X  10-* 

.055800  X  10* 

104« 

.303828  X  10* 

.313045  X  10* 

.122021  X  10-* 

.323888  X  10~4 

.066400  X  10* 

The  results  indicate  that  for  small  h,  the  error  in  $>r  can  be  dominated  by 
cancellation  error  to  the  extent  that  no  figures  may  be  correct.  As  the  interval  is 
increased,  the  relative  truncation  error  increases,  but  the  relative  cancellation  er¬ 
ror  decreases.  The  truncation  error  in  approximating  /'  by  is  approximately 
linear  in  h  and  the  cancellation  error  is  approximately  linear  in  1/fc;  this  property 
will  have  important  implications  in  methods  for  computing  reasonable  values  of 
h.  Clearly  there  is  an  optimal  value  of  h  that  “balances”  the  relative  trunca¬ 
tion  and  cancellation  errors,  i.e.  for  which  these  errors  are  approximately  equal. 
Examination  of  Table  1  indicates  that  the  optimal  value  of  h  for  the  computed 
example  lies  between  h  =  .95  X  10*“4  and  h  =  .95  X  10~*. 

The  relationship  between  cancellation  errors  in  #>r,  $>B,  and  &  is  significant 
in  procedures  for  estimating  /#  and  /".  Note  that  T,  the  cancellation  error  in 
&,  depends  not  only  on  the  closeness  of  and  <?>B,  but  also  on  the  value  of  a, 
which  reflects  the  cancellation  error  in  computing  the  first-order  estimates  of  /'. 
When  h  is  so  small  that  cancellation  error  dominates  $F  and  $>B,  the  cancellation 
error  in  &  will  also  be  large,  not  because  and  $>B  are  nearly  equal,  but  rather 
because  a  is  large.  In  addition,  the  value  of  T  remains  large  even  when  h  has 
been  increased  enough  so  that  and  $B  are  most  accurate.  The  cancellation 
error  in  4>  becomes  reasonable  only  when  two  conditions  are  satisfied:  h  is  large 
enough  so  that  the  cancellation  errors  ffF  and  fja  are  small,  and  so  that  $>r  and 
0B  differ  by  a  significant  amount. 

In  Table  2,  we  illustrate  this  relationship  for  the  function  (22).  The  exact 
value  of  fu,  rounded  to  six  figures,  is  .242661  X  10a;  note  that  &  is  wrong  by 
several  orders  of  magnitude  until  h  has  exceeded  the  "optimal"  value  for  com¬ 
puting  and  $>B.  This  point  is  important  in  methods  based  on  approximating 
both  f  and  fM  by  finite  differences. 


4.2  Balancing  truncation  and  cancellation  errors. 

To  find  an  interval  that  balances  truncation  and  cancellation  errors,  it  is  neces¬ 
sary  to  compute  estimates  of  these  quantities.  If  /'  is  not  pathologically  small 
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Table  2 

Errors  in  approximating  /'  and  J" 


A 

hr 

hm 

T 

A 

< 

.302510  X  109 

.233028  X  10* 

.502215  X  109 

—.314573  X  10T 

1Q< 

.300015  X  10-* 

.800703  X  10~l 

.503080  X  IQ9 

— .314572  X  109 

10a« 

.318233  X  10“a 

.313805  X  10“9 

.431000  X  10* 

— .410430  X  109 

10»< 

.318730  X  10-‘ 

J 18477  X  10-9 

.123300  X  10* 

.220200  X  10* 

104« 

.323888  X  10“* 

.322040  X  10“4 

.135330  X  10“* 

.242221  X  10* 

109« 

.373034  X  10-* 

J 58783  X  10“9 

.175133  X  10“4 

.243510  X  10* 

with  respect  to  /  and  /*,  and  A  > «,  the  cancellation  error  in  (from  (13))  is 
approximately  a  linear  function  of  1/A.  Hence,  for  two  intervals  Aj  and  Aa* 


(23) 


The  truncation  error  clearly  involves  higher  derivatives  of  /,  which  are  (by 
assumption)  not  available.  Therefore,  the  procedure  for  obtaining  an  "optimal” 
interval  utilises  estimates  of  the  needed  values.  In  the  remainder  of  this  sub* 
section,  we  shall  assume  that  sufficiently  accurate  approximations  are  available; 
in  Section  4.4  we  present  a  procedure  for  obtaining  such  estimates  using  function 
values  only. 

We  seek  t{h),  a  computable  estimate  of  the  relative  truncation  error,  that 
may  be  balanced  against  the  relative  cancellation  error  J),.  The  idea  of  balancing 
these  two  sources  of  error  carries  the  implicit  assumption  that  they  are  measured 
in  a  comparable  manner.  In  particular,  since  the  measure  of  relative  cancellation 
error  is  bounded  above  by  unity,  this  requirement  is  also  imposed  on  t[h). 

In  almost  all  instances,  a  suitable  model  for  t(h)  is 


2imAi/wr 


(24) 


which  is  based  on  the  first  neglected  term  of  the  Taylor  series.  This  expression 
for  t(h)  is  bounded  above  by  unity;  we  also  adopt  the  convention  that  t(h)  =  1 
when  /*  =  f"  =  0. 

If  \f"\  is  pathologically  small  with  respect  to  |/*|,  a  more  complicated  ex¬ 
pression  for  T(h)  is  appropriate: 


ft*  =  *1/*+ win +Di 

1 1  wi+Ai/'+wm+Dr 


(25) 
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Let  k  denote  an  estimate  of  the  absolute  interval  for  which  truncation  and 
cancellation  error  are  balanced.  Assuming  that  (),  has  been  computed  for  some 
interval  h,  k  can  be  obtained  by  equating  (23)  and  (24): 


*1/1 

a|/'l+A|/*f 


The  deeired  h  it  the  poiftWe  root  of  the  renlting  quadratic  equation 

AVl  -  A*l/%(»)  -  »!/'»,(*)  = 


(26) 

(27) 


so  that 


k 


(28) 


When  formula  (25)  must  be  used  to  estimate  t(h),  k  is  the  solution  of  a 
cubic  equation. 


4.3  Finding  intervals  for  well-sealed  functions. 

For  many  functions,  a  near-optimal  value  of  h  can  be  derived  bj  inspection. 
Suppose  that  at  a  point  x  (—1  <  x  <  1),  the  values  of  the  function  /  and  all  its 
derivatives  are  of  the  same  order  of  magnitude;  the  balancing  relationship  (26) 
then  yields  an  k  of  order  y/i.  This  result  can  also  be  derived  by  inspection,  as 
follows. 

The  relative  truncation  error  in  approximating  /*  by  p,  is  0(h);  further¬ 
more,  a  perturbation  h  induces  a  relative  change  of  0(h)  in  the  function  value 
and  its  derivatives.  Figure  1  depicts  the  mantissas  of  /(x),  f[x  +  h)  and  A/  = 
/(*  +  h)  —  /(as)  on  a  six-digit  machine,  with  h  =  y/i,  where  the  errors  in  round¬ 
ing  have  been  ignored.  This  perturbation  causes  half  the  significant  figures  of 
/(x  -f  h)  to  be  different  from  those  of  /(x);  the  identical  digits  are  marked  with  a 
"•*.  In  computing  A/,  only  half  the  precision  of  /  and  /(s+h)  will  be  retained, 
and  A  /  itself  will  have  a  relative  precision  of  only  y/e,  so  that  the  three  least 
significant  digits  of  A/,  marked  with  a  ax”,  are  unreliable.  Thus,  the  truncar 
tion  and  cancellation  errors  in  0,  are  of  similar  order;  it  is  this  reasoning  that 
underlies  the  "folklore”  value  of  as  the  "best*  finite-difference  interval. 


mantissa  of 

/(*) 

•  1  •  1  •  I  m4  |  ms  1  m#  1 

mantissa  of 

/(*  +  *) 

•  1  •  1  •  1  ms  1  m»  |  ms  I 

mantissa  of 

kj 

1  mi  ms  1  ms  X  1  X  I  X  1 

Figure  1.  Schematic  diagram  of  six-digit  mantissas  during  first-order  differencing. 
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mutlm  of  4>f 
auattn*  of 
maatlm  of  — 


I  •  1  •  I  m»  I  wm  I  X  I  X  t 


•  I  ‘  I  «s  I  «N  I  X  I  X  | 


milimlxlxlx  1x1 


Figure  2.  Schematic  diagram  of  lix-digit  mantissas  during  second-order  differencing. 


Similar  perturbation  analytes  can  be  used  to  estimate  nearly  optimal  finite- 
difference  intervals  for  other  approximations.  For  the  central-difference  formula 
defined  by 

rJM^lSs±^=JSsz -A  (») 

the  truncation  error  is  0(Aa);  however,  the  relative  cancellation  error  in  the  com* 
puted  version  $>c  remains  at  0{t/h)  (estimated  from  (13),  assuming  that  A  >  c). 
This  implies  that  the  optimal  finite-difference  interval  is  O(^e). 

Finding  the  optimal  interval  to  approximate  /*  using  (18)  is  more  compli¬ 
cated,  as  noted  in  Section  4.1.  Let  6  be  computed  by  the  formula  (20).  In  this 
case,  we  wish  to  balance  the  truncation  error  and  cancellation  error  as  usual, 
but  it  is  essential  to  include  the  effects  of  cancellation  error  in  computing  the 
intermediate  quantities  and  For  a  perturbation  h,  $r  and  $>B  will  have 
cancellation  errors  of  0{tfh).  Thus,  applying  (13)  to  (20),  the  relative  cancella¬ 
tion  error  in  &  will  be  0(t/h2),  assuming  that  h  >  c/A.  Since  the  truncation 
error  in  $  is  0(A),  the  optimal  interval  in  this  case  is  Figure  2  depicts  the 
relevant  six-digit  mantissas  when  using  6  to  approximate  fH  with  A  =  In 
this  case  the  two  least  significant  digits  of  $r  and  $>B  will  always  be  unreliable, 
and  their  computed  difference  will  have  an  associated  cancellation  error  of 
The  foregoing  analysis  will  not  apply  if  for  f"  vanishes,  or  if  the  derivatives 
of  /  vary  substantially  in  magnitude  at  the  point  of  interest.  If  /*  is  sero  and 
l/wl »  |/|,  a  relative  perturbation  in  x  of  0(A)  will  produce  a  relative  perturbar 
tion  in  /  of  0(Aa)  rather  than  0(A),  and  consequently  a  perturbation  of  0(y/e) 
will  generally  produce  identical  figures  in  ail  the  digits  of  the  mantissas  of  /(*) 
and  /(*  +  A).  A  finite-difference  estimate  of  /'  is  thus  bound  to  be  inaccurate 
in  a  relative  sense  when  /*  is  small  compared  to  /  and  /”,  since  either  or  both 
sources  of  error  will  dominate  the  magnitude  of  /',  regardless  of  whether  they 
are  balanced  or  not.  Nonetheless,  we  can  still  compute  a  satisfactory  estimate 
of  /"  with  A  =  v^f,  because  and  $B  will  retain  some  absolute  accuracy. 


4.4  Computing  estimates  of  /*  and  /*. 

The  values  of  /'(*)  and  /”(*)  are  required  if  (28)  is  used  to  compute  estimates 
of  the  optimal  finite-diflbreace  interval.  Since  these  values  are  assumed  to  be 
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unavailable  (otherwise,  finite-difference  intervals  would  not  be  a  source  of  con¬ 
cern),  we  must  be  able  to  compute  meaningful  approximations  to  /'  and  f"  with 
some  initial  interval  k  in  order  to  estimate  the  desired  interval  h.  Note  that  it  is 
unnecessary  to  compute  /*  or  /*  to  high  accuracy,  since  an  estimate  of  h  with 
just  one  correct  figure  will  suffice. 

We  hope  to  be  able  to  use  a  single  initial  finite- difference  interval  h  to  com¬ 
pute  both  estimates  in  order  to  reduce  the  number  of  times  that  the  function  is 
evaluated.  This  interval  and  its  associated  cancellation  error  can  then  be  used  as 
the  values  that  appear  explicitly  in  the  formula  (28).  Since  two  function  evaluar 
tions  are  required  to  compute  4,  these  values  may  also  be  used  to  compute  a 
central-difference  approximation  $>a  to  f,  using  (29). 

The  procedure  that  we  suggest  is  based  on  choosing  h  to  be  large  enough 
so  that  the  cancellation  error  in  6  is  less  than  1  IP,  which  implies  that  the  es¬ 
timate  of  /"  has  approximately  two  figures  free  of  computational  error.  The 
corresponding  interval  should  give  an  adequate  central  difference  estimate  of  /', 
using  (29). 

For  a  well-scaled  function,  the  interval  y/t(  1  +  |z|)  will  generally  produce 
no  correct  digits  in  4,  while  providing  the  best  value  of  0,.  Hence,  the  initial 
finite-difference  interval  used  to  compute  &  is  given  by 

d«M(l  +  l*l),  (30) 

because  for  a  well-scaled  function  we  would  expect  the  resulting  &(A)  to  contain 
approximately  two  digits  free  of  computational  error.  However,  bad  scaling  in  / 
may  cause-  (30)  to  be  a  poor  choice  for  the  finite-difference  interval. 

The  algorithm  that  we  propose  for  choosing  an  appropriate  value  of  h  is 
based  on  the  observations  in  Section  4.1  concerning  the  relationship  between 
9jb,  and  T.  If  (30)  is  too  small,  6  will  tend  to  display  excessive  cancellation  error 
for  one  of  two  reasons:  and  f)t  are  too  large,  or  0,  and  $>B  are  too  close.  If 
the  interval  (30)  is  too  large  (which  implies  that  the  truncation  error  in  6  may 
be  unacceptably  large),  the  cancellation  error  will  tend  to  be  quite  small.  In 
effect,  the  interval  is  chosen  to  be  the  smallest  possible  for  which  the  cancellation 
error  still  allows  some  accuracy.  If  the  truncation  error  for  such  an  interval  is 
unacceptable,  no  satisfactory  interval  exists. 

In  the  implemented  version  of  the  following  algorithm,  p  =  fi,  and  K  — 
fr/2],  where  fa)  is  the  smallest  integer  greater  than  or  equal  to  o.  The  formal 
statement  of  the  algorithm  is: 

Algorithm  1  (Estimating  /'  and  /w  by  Suite  differences ). 

1.  (Initialisation.)  Evaluate  /(*).  Set  k  ♦- 1,  h  «-  py/i{ I  +  |c|). 

2*  Evaluate  /(*  -f  h),  /(*  —  h),  and  the  corresponding  finite-difference  ap¬ 
proximations  and  estimates  of  cancellation  error. 
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S.  If  max (#,,  t)B)  <  1/p  or  k  =  K,  fo  to  step  5;  otherwise,  go  (o  stop  4. 

4.  [Increase  h.]  Sot  k  ♦-  k  + 1,  A «-  pA  and  return  to  itop  2. 

5.  [Either  the  cancellation  error  in /'  ia  small  enough  or  A  h as  been  increased 

times.]  Set  A,  «-  A.  If  T  <  l/pa  and  h  =  l,  go  to  step  8;  otherwise,  go 
to  step  8. 

8.  If  T  <  1/p,  go  to  step  10.  If  k  <  K  and  T  >  1/p,  go  to  step  7.  Otherwise, 
set  A «-  A,  and  go  to  step  10. 

7.  [Increase  A  in  order  to  find  a  suitable  estimate  of  /*.]  Set  k  «-  k  + 1  and 
A  «-  ph.  Evaluate  /(*+ A),  /(*— A),  and  the  corresponding  finite-difference 
approximations  and  estimates  of  cancellation  error.  Return  to  step  6. 

8.  [Decrease  h  in  order  to  increase  cancellation  error  in  6.]  Set  k  «-  k  +  1 
and  A  hfp.  Evaluate  /(*  +  A),  /(*  —  A),  and  the  corresponding  finite- 
difference  approximations  and  estimates  of  cancellation  error.  Go  to  step  0. 

9.  If  1/p  >  T  >  1/p2,  go  to  step  10.  If  k  <  K,  go  to  step  8.  Otherwise,  set 
A «-  h,  and  go  to  step  10. 

10.  [Test  sise  of  second  derivative.]  If  ♦  >  A  min(0,,  $m),  the  algorithm  ter¬ 
minates. 

11.  [Estimate  third  derivative.]  Evaluate  /(*+pA),  and  compute  7,  an  estimate 
of 

7 = + ,h) + (w*  -  *>  -  + *» 

+  /(*)  -  jP(/(*  +  *)  -  */(*)  +  /(*  -  »»)• 

The  algorithm  then  terminates.  | 

In  practice,  /(*  —  A)  is  not  evaluated  in  step  2  if  is  too  large. 

To  illustrate  the  behavior  of  Algorithm  1,  we  consider  finding  the  derivatives 
corresponding  to  the  first  component  of  the  multivariate  function 

F(x)  =  (*,  - 100)*  +  (*a  - 1000)’  +  (*a  +  *4  - 1000)’ 

+  (*,-1000)a- (*,  +  *, -200)*  (»1) 

+ 10-»(*,  +  2ta  +  3xa  +  4*4  +  5*,  +  «*,  +  7*r  -  W00)*, 

*t  tha  point  {0,0,400, 100, 0,0, 0)T.  TaW.  3  lira  Un  trial  Taints  ot  h  fnaratxd 
by  Algorithm  1,  along  with  the  corresponding  estimates  of  cancellation  error, 
0,,  and  4.  The  exact  values  of  the  first  and  second  derivatives  (rounded  to  six 
figures)  are  —199.730  and  1.99820.  The  final  value  of  0O  is  —199.730. 


4r 


I 


I 


a 


.ssoooo  x  io* 

.633344  X  10* 

—456000  X  10* 

—.400000  X  10* 

.40633$  X  10~x 

.6T4364  X  10" 

-400000  X  10* 

—.100000  X  10* 

.36066#  X  10-* 

461340  X  10-» 

—.105750  X  10* 

.103700  X  10* 

.343641  X  10-* 

401040  X  10“* 

—.116707  X  10* 

.100700  X  10* 

4.5  Numerical  examples. 

The  success  of  the  suggested  procedures  for  obtaining  sensible  finite-diffsreiice  in¬ 
tervals  depends  on  two  assumptions:  that  the  linear  relationship  (26)  in  l/k  holds 
for  cancellation  error,  and  that  relative  truncation  error  is  reliably  estimated  by 
formula  (24),  using  the  estimates  of  /'  and  f"  produced  by  Algorithm  1.  In 
this  section  we  give  some  numerical  examples  of  the  computation  of  estimates 
of  optimal  finite-difference  intervals,  using  Algorithm  1  to  obtain  initial  finite- 
difference  estimates  of  /'  and  /*,  followed  by  application  of  formula  (28)  to 
compute  A. 

Tables  4  and  5  contain  the  results  of  estimating  an  absolute  finite-difference 
interval  for  each  gradient  component  ty(s),  /  =  l,...,»i,  for  two  multivariate 
functions.  In  each  table,  the  first  column  contains  the  index  of  the  gradient 
component.  The  second  column  contains  the  final  value  of  h,  as  obtained  by 
Algorithm  1,  and  the  third  column  contains  A,  computed  from  (28).  The  fourth 
column  contains  the  value  of  h*,  which  satisfies  the  nonlinear  equation 


MO- 


/'(*) 


Thus,  h*  is  the  finite-difference  interval  for  which  the  cancellation  error  and 
the  exact  relative  truncation  error  are  "equal”,  where  the  tolerance  that  defines 
equality  must  be  chosen  based  on  the  limited  accuracy  of  the  computed  value  of 

&r’ 

In  Table  4,  we  give  results  for  the  function: 


F(*)  =  (*i  +  10«a)a  +  5(«,  -  *4)*  +  (*,  -  it,)*  +  10(f  i  -  *«)*,  (*») 

•t  the  point  *  =  (3,  —1,0,  l)r.  The  rettebitttp  of  both  underlying  terampUoae 
is  shown  by  the  similarity  of  columns  8  and  4.  In  addition,  Table  4  displays  the 
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Table  4 


Optimal  and  Estimated  Intervals  for  a  Wed-Scaled  Problem 


•MM  X  10“* 
•1M*  X  10“* 
•IMS  X  10“* 
•ISM  X  10“* 


.8347  X  10“* 
.1405  X  10“* 
MTS  X  10“* 
.0214  X  10“* 


a* 

•11MX  10“* 
•1440  X  10“* 
MOO  X  10“* 
.0814  X  10“» 


wall-scaled  nature  of  the  function  (32),  since  the  optimal  interval  for  the  t-th 
variable  is  of  order  >/c(l  +  |s,|). 

Table  5  contains  the  analogous  results  for  the  seven-variable  badly  scaled 
function  (31).  The  optimal  intervals  are  again  closely  predicted  using  the  proce¬ 
dures  of  Sections  4.3  and  4.4. 


5.  Using  finite  differences  in  minimization 

A  descent  method  for  minimising  F(x),  x  €  ft*,  proceeds  as  follows.  Let  k  denote 
the  current  iteration,  and  let  a*  denote  the  h-th  estimate  of  the  solution  (starting 
-with  k  ss  0  at  the  initial  point  xq).  Each  iteration  requires  y(x*)  (sometimes 
denoted  by  gu),  the  gradient  vector  VF(x)  evaluated  at  x*.  At  the  fc-th  iteration 
a  direction  of  search  p*  is  computed  such  that  the  directional  derivative  g[pk  is 
negative,  and  the  new  estimate  x*+i  is  given  by  x* + <*kPk,  where  a*  (a  positive 
step  length)  is  chosen  such  that  F(zk  +  o*p*)  <  F[*h). 

Quasi-Newton  methods  are  probably  the  most  commonly  used  techniques  for 
unconstrained  minimisation  when  only  first  derivatives  are  available  (see  Dennis 
and  Mor4, 1077).  In  a  quasi-Newton  method,  the  direction  of  search  is  defined 

Tables 

Optimal  end  Estimated  Intervals  for  •  Badly  Scaled  Problem 


.5400  X  10* 

•ITT!  X  10* 

4TTI  X  10* 

.5400  X  10* 

.1408  X  10* 

.1TM  X  10* 

.SSM  X  I01 

.1404  X  10* 

•1T84  X  10* 

.8585  X  10* 

.1484  X  10* 

4TSS  X  10* 

.5400  X  10* 

.14M  X  10* 

.1T40  X  10* 

.4000  X  10* 

.1448  X  10* 

4880  X  10* 

MMX  10* 

.1401  X  10* 

.1855X10* 
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as  the  solution  of  the  linear  system 


BkPk  =  -ft, 


(33) 


where  Bk  is  an  approximation  to  the  Hessian  matrix.  Alter  each  iteration  the 
approximate  Hessian  is  updated,  the  most  popular  update  being  the  BFGS  for¬ 
mula 

Bh+i  =  3*  * f  ~~9k9k  +  —~^1k9k, 

9lPk  WlPt 


where  ft  =  ft+i  —  ft  (see  Dennis  and  Mor4, 1977).  The  BFG8  update  is  a 
special  case  of  the  more  general  Brojden  one-parameter  family 


1 

“tflPk 


Vk9k 


-+k9kPkvl"k> 


(34) 


where 


«*  = 


1 


and  ph  is  a  scalar  fraction  of  ft  and  asp*  (see  Brojden,  1970;  GUI  and  Murray, 
1972). 

A  quasi-Newton  method  can  be  implemented  even  when  first  derivatives  are 
not  available  by  using  a  finite-difference  estimate  fo  of  the  gradient  vector  ft. 
We  now  consider  two  methods  for  choosing  a  reasonable  finite-difference  interval 
to  compute  the  approximate  gradient. 


5.1  Stewart's  modification  of  a  quasi-Newton  method. 

Stewart  (1967)  suggested  a  modification  of  a  quasi-Newton  method  in  which  the 
finite-difference  interval  is  re-estimated  at  each  iteration.  Hi  the  foUowing  dis¬ 
cussion,  we  suppose  that  the  1-th  iteration  has  been  performed,  and  that  is 
available.  Let  denote  the  jth  element  of  fa,  and  F/  the  ;th  diagonal  element 
o1Bk. 

The  finite-difference  interval  is  chosen  by  balancing  estimates  of  relative 
truncation  and  cancellation  errors.  Both  sources  of  error  are  estimated  using 
a  local  quadratic  approximation  based  on  the  function  value  at  x*+i,  and  the 
approximate  gradient  and  quasi-Newton  approximation  to  the  Hessian  from  the 
previous  iteration: 

F(**+i  +  fyiy) «  q(h/)  *  Fk+i  +  hfij  +  jh}//. 
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The  relationship  that  balances  the  two  sources  of  orror  ia  than 


2 

V 

(35) 


whore  the  left-head  side  of  (35)  represents  the  relative  truncation  error,  and  the 
right-hand  side  is  an  estimate  of  the  relative  cancellation  error.  The  desired 
value  of  hj  then  solves  the  resulting  cubic  equation. 

It  is  customary  in  quasi-Newton  methods  to  recur  either  the  inverse  or  a  fac¬ 
torisation  of  Bk,  in  order  to  facilitate  the  solution  of  equation  (33).  Consequently, 
with  Stewart's  approach  the  diagonal  elements  of  Bk  must  be  recurred  separately. 
To  recur  these  elements  for  the  BFGS  update,  for  example,  let  7y  and  ^y  denote 
the  j-th  elements  of  *nd  pk  respectively,  where  fo  is  computed  with  the  ap¬ 
proximate  gradients;  let  7y  denote  the  /-th  diagonal  of  the  approximate  Hessian 
during  iteration  k  •+• 1.  Then 


*V  =  'V  +  -f-T?  + 

fiPk 


akVkPk 


Stewart’s  estimate  of  the  finite-difference  interval  depends  critically  upon  the 
assumption  that  the  diagonal  elements  of  the  approximate  Hessian  matrix  are 
similar  in  magnitude  to  the  diagonal  elements  of  the  true  Hessian.  The  follow¬ 
ing  theorem  indicates  that  tins  assumption  is  unjustified,  even  after  the  first  n 
iterations.  It  is  unlikely  to  be  true  before  this  point  because  the  set  of  generated 
directions  does  not  span  the  full  space. 

Theorem  1.  Let  F(x)  be  a nj  twice-continuousljr  differentiable  function.  Let  {**} 
be  a  sequence  of  points  gonerated  by  applying  the  BFGS  update,  for  some  given 
initial  zo  and  Hessian  approximation  Bo,  where  each  on  is  the  smallest  positive 
step  from  xk  to  a  minimum  of  F  along  p*.  Let  {*£}  be  the  sequence  generated  by 
using  update  (34),  with  the  same  starting  point  and  initial  approximate  Hessian. 
Let  pj,  oj  and  B*  denote  the  corresponding  values  ofp*,  a*  and  £*  respectively, 
where  a£  satisfies  the  same  criteria  as  a*.  Jf  each  of  the  sequences  {Bk}  end 
{£*}  is  well-defined,  then 


and 


(38) 


(3T) 


Proof.  A  proof  is  given  by  Dixon  (1972).  | 


$5.2 


AN  ALTERNATIVE  PROPOSAL 


10 


Equation  (37)  implies  that  the  same  sequence  {**}  will  be  generated  b y  any 
well-defined  member  of  the  Brqyden  class  of  quasi-Newton  updates  when  the  step 
length  at  eaeh  iteration  is  the  nearest  minimum  along  the  direction  of  search. 
Similarly,  (36)  implies  that  all  the  approximate  Hessian  matrices  generated  by 
the  one-parameter  family  differ  by  s  matrix  of  rank  one.  Thus,  even  tor  a  quad¬ 
ratic  function,  the  elements  of  the  approximate  Hessian  will  generally  be  poor 
approximations  to  the  elements  of  the  true  Hessian. 

One  unfortunate  consequence  of  this  result  is  that  for  functions  that  are  well 
scaled,  the  interval  found  by  Stewart's  method  may  diflur  substantially  from  >/i, 
and  hence  the  gradient  elements  will  contain  unnecessary  error. 


5.2  An  alternative  proposal. 

We  propose  an  alternative  procedure  for  selecting  the  finite-difference  interval, 
based  on  the  following  observations.  The  procedures  described  in  Section  4 
require  at  least  2n  function  evaluations  to  compute  a  satisfactory  set  of  inter¬ 
vals,  and  thus  would  be  prohibitively  expensive  if  carried  out  at  every  iterar 
tion.  Although  Stewart's  method  does  not  require  additional  function  evalua¬ 
tions,  we  do  not  favor  its  use  because  there  is  no  theoretical  basis  for  the  as¬ 
sociated  estimates  of  truncation  and  cancellation  errors,  as  shown  in  Section  5.1. 
Furthermore,  in  our  experience  the  estimates  produced  by  Stewart's  method  do 
not  reliably  reflect  the  true  site  of  the  errors. 

The  proposed  method  executes  the  procedures  of  Section  4  at  the  initial 
point  to  develop  a  set  of  absolute  intervals  Ay,  j  =  1, . . . ,  n.  The  relative  interval 
Sj  used  for  the  j- th  variable  is 


1**1  <  1. 

1**1  > 


where  z  is  the  point  at  which  {Ay}  are  computed.  In  effect,  Ay  is  interpreted  as 
an  absolute  step  when  |zy|  is  of  order  unity  or  less,  and  as  a  relative  perturbation 
otherwise. 

The  approximate  gradients  are  computed  with  the  forward-difference  for¬ 
mula  (2)  until  they  appear  to  have  become  "unreliable*.  For  example,  the 
forward-difference  estimate  is  not  used  when  the  relative  cancellation  error  in 
every  component  of  the  approximate  gradient  exceeds  1/0.  Hi  this  case,  a  switch 
is  made  to  the  central-difference  approximation  (8),  with  the  slightly  larger  in¬ 
terval 


which  is  derived  from  an  analysis  similar  to  that  in  Section  4  J.  To  conserve 
the  number  of  evaluations  of  F{ «),  it  is  recommended  that  the  first  central- 


20  7INITB-DIPFXRBNCB  APPROXIMATIONS 


§5.2 


difference  approximation  be  computed  with  hj,  since  the  let  of  function  values 
{F[x  +  hjtj)}  will  hare  already  been  computed  for  the  forward-difference  es¬ 
timate.  A  switch  to  central  differences  is  also  made  if  the  step  length  a*  is 
such  that  the  change  in  z  will  be  less  than  the  current  finite-difference  interval. 
The  reader  is  referred  to  Gill  and  Murray  (1972)  for  further  details  concerning 
switches  in  formulae. 

An  advantage  of  carrying  out  the  estimation  of  the  optimal  k  at  the  starting 
point  is  that  an  initial  central-difference  approximation  to  the  gradient  is  thereby 
produced,  as  well  as  estimates  of  the  diagonal  elements  of  the  Hessian  matrix, 
which  may  be  used  to  initialise  the  Hessian  approximation  for  a  quasi-Newton 
method.  Retaining  the  same  finite-difference  interval  throughout  the  minimisa¬ 
tion  results  in  similar  truncation  errors  in  g*  and  9k+i,  which  is  beneficial  when 
computing  the  vector  yk  needed  to  perform  the  quasi-Newton  update.  Tins  latter 
property  implies  that  the  finite- difference  version  of  any  quasi-Newton  method 
that  displays  n-step  termination  with  exact  gradients  will  also  achieve  n-step 
termination  when  F  is  a  well-scaled  quadratic  function. 

A  word  of  warning  is  appropriate  at  this  point.  There  may  not  exist  a 
fixed  set  of  intervals  that  are  appropriate  throughout  the  range  of  points  where 
gradients  must  be  approximated  during  a  minimisation,  and  consequently,  the 
given  procedure  may  not  produce  a  sensible  set  of  intervals  for  every  point  sub¬ 
sequently  generated.  Nonetheless,  in  most  practical  applications  it  is  adequate 
to  examine  properties  of  the  function  in  some  detail  at  a  point  that  typifies  those 
for  which  the  gradient  will  be  required.  In  the  unlikely  event  that  a  minimisation 
algorithm  fails  because  of  a  poor  choice  of  finite-difference  interval,  it  can  be 
restarted  in  order  to  produce  intervals  more  suitable  for  the  neighbourhood  in 
which  failure  occurred. 


6.  Conclusions 

We  have  described  methods  for  estimating  reasonable  finite-difference  intervals 
for  several  finite-difference  formulae.  These  techniques  are  useful  not  only  in 
optimisation  algorithms,  but  in  other  circumstances  (such  as  sensitivity  analysis) 
where  accurate  approximations  to  derivatives  are  required.  Automatic  scaling 
procedures  (see,  e.g.,  Gill  et  a /.,  1980)  also  rely  on  the  use  of  finite-difference 
approximations  to  analyse  the  behavior  of  functions. 

The  cost  of  implementing  the  method  of  Section  5.2  (except  for  badly  scaled 
functions)  is  typically  only  2n  function  evaluations,  which  are  carried  out  only 
at  the  initial  point.  This  number  of  additional  evaluations  is  unlikely  to  be 
significant,  particularly  in  view  of  the  improved  efficiency  during  the  optimisa¬ 
tion;  for  example,  delaying  the  switch  to  central  differences  by  just  one  iteration 
will  tend  to  result  in  a  net  gain  in  efficiency.  For  badly  scaled  functions,  the 
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um  of  ft  well-chosen  finite-difference  interval  ms y  be  crucial  to  the  success  of  an 
algorithm. 

In  this  paper,  fie  hare  been  concerned  solely  with  the  unconstrained  prob¬ 
lem;  however,  a  similar  analysis  can  (and  should)  be  extended  to  constrained 
problems.  It  is  sometimes  thought  that  the  constrained  case  is  easier  because 
the  gradient  of  the  objective  function  is  not  necessarily  sero  at  the  solution. 
Nonetheless,  some  projection  of  the  gradient  does  decrease  to  sero,  and  the  ac¬ 
curacy  of  any  approximation  to  the  projected  gradient  is  critical  in  computing 
the  search  direction.  In  fact,  in  a  constrained  problem  where  derivatives  are  not 
available,  it  is  usually  more  efficient,  and  certainly  superior  in  terms  of  numeri¬ 
cal  stability,  to  approximate  the  projected  gradient  directly  by  finite  differences 
along  selected  vectors.  In  this  case,  it  is  clear  that  the  selection  of  a  reasonable 
finite-difference  interval  is  just  as  important  as  in  the  unconstrained  case. 
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